NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

View-aware Message Passing Through the Integration of Kokkos and ExaMPI

https://doi.org/10.1145/3615318.3615321

Suggs, Evan; Olivier, Stephen; Ciesko, Jan; Skjellum, Anthony (September 2023, Proceedings of the 30th European MPI Users' Group Meetin)

Kokkos provides in-memory advanced data structures, concurrency, and algorithms to support performance portable C++ parallel programming across CPUs and GPUs. The Message Passing Interface (MPI) provides the most widely used message passing model for inter-node communication. Many programmers use both Kokkos and MPI together. In this paper, Kokkos is integrated within an MPI implementation for ease of use in applications that use both Kokkos and MPI, without sacrificing performance. For instance, this model allows passing first-class Kokkos objects directly to extended C++-based MPI APIs. We prototype this integrated model using ExaMPI, a C++17- based subset implementation of MPI-4.We then demonstrate use of our C++-friendly APIs and Kokkos extensions through benchmarks and a mini-application.We explain why direct use of Kokkos within certain parts of the MPI implementation is crucial to performance and enhanced expressivity. Although the evaluation in this paper focuses on CPU-based examples, we also motivate why making Kokkos memory spaces visible to the MPI implementation generalizes the idea of “CPU memory” and “GPU memory” in ways that enable further optimizations in heterogeneous Exascale architectures. Finally, we describe future goals and show how these mesh both with a possible future C++ API for MPI-5 as well as the potential to accelerate MPI on such architectures.
more » « less
Full Text Available
MultiGrid on FPGA Using Data Parallel C++

https://doi.org/10.1109/IPDPSW55747.2022.00147

Siefert, Christopher; Olivier, Stephen L.; Voskuilen, Gwendolyn; Young, Jeffrey (May 2022, IPDSPW)

Centered on modern C++ and the SYCL standard for heterogeneous programming, Data Parallel C++ (dpc++) and Intel's oneAPI software ecosystem aim to lower the barrier to entry for the use of accelerators like FPGAs in diverse applications. In this work, we consider the usage of FPGAs for scientific computing, in particular with a multigrid solver, MueLu. We report on early experiences implementing kernels of the solver in DPC++ for execution on Stratix 10 FPGAs, and we evaluate several algorithmic design and implementation choices. These choices not only impact performance, but also shed light on the capabilities and limitations of DPC++ and oneAPI.
more » « less
Full Text Available
Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM

https://doi.org/10.1145/3225058.3225116

Butcher, Neil; Olivier, Stephen L.; Berry, Jonathan; Hammond, Simon D.; Kogge, Peter M. (August 2018, International Conference on Parallel Processing)

Technologies such as Multi-Channel DRAM (MCDRAM) or High Bandwidth Memory (HBM) provide significantly more bandwidth than conventional memory. This trend has raised questions about how applications should manage data transfers between levels.This paper focuses on evaluating different usage modes of the MCDRAM in Intel Knights Landing (KNL) manycore processors. We evaluate these usage modes with a sorting kernel and a sortingbased streaming benchmark. We develop a performance model for the benchmark and use experimental evidence to demonstrate the correctness of the model. The model projects near-optimal numbers of copy threads for memory bandwidth bound computations. We demonstrate on KNL up to a 1.9X speedup for sort when the problem does not fit in MCDRAM over an OpenMP GNU sort that does not use MCDRAM.
more » « less
Full Text Available

Search for: All records